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Abstract 

New advances in mass spectrometry-based proteomics technology are having a 
major impact on our understanding of how human spermatozoa acquire their cap- 
acity for fertilization. A complete analysis of the proteins found in the human 
spermatozoa is essential for understanding the events leading up to, and including, 
fertilization and early embryo development. In this short review, we have collected 
the human sperm proteome from the literature and analyzed it by the Database for 
Annotation, Visualization and Integrated Discovery (DAVID) software. Bioinform- 
atics analysis demonstrated that the collected 1,300 proteins were involved in 
various metabolic pathways including catabolic processes. Additionally, the majority 
of the collected human sperm proteome belonged to cytoplasm. Application of the 
multi-dimensional protein identification technology (MudPIT) for obtaining a better 
coverage of the hydrophobic and basic proteins of the human sperm proteome is 
recommended. 
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S Introduction 
permatogenesis is a unique process in male 
gender to produce male haploid germ cells 
from diploid progenitor cells. Spermatogenesis in- 
cludes two sequential division of meiosis that con- 
vert one diploid spermatogia cell to four haploid 
cells. Spermatogenesis is a specialized process of 
differentiation of haploid round spermatid cells to 
the highly specialized sperm cell, the spermato- 
zoon. Sperm function is to deliver the paternal 
genome to the oocyte. 

Identification of protein molecules involved in 
sperm function, fertilization and early embryo de- 
velopment increase our knowledge about sperm 
biology and it will be applied in reproductive 
medicine and treatment of some inborn genetic 
diseases to generate a healthier offspring. The im- 
portance and the easy accessibility of sperm cells 
have favored the study of its composition and 



mechanisms involved in its differentiation and 
function (1 - 5). The protein content of the sperm 
was one of the first cells to be studied. It was the 
pioneering work done by Friedrich Miescher in 
1874 that led to the isolation and identification of 
protamine. Recently, use of mass spectrumetry- 
based (MS) proteomics technology has further 
contributed to the identification of the proteome 
that make up spermatozoa (6 - 12). 

In the current short review we focused on the 
proteins of the spermatozoa identified by MS 
proteomics technology. As a methodological ap- 
proach we considered for inclusion all the articles 
retrieved from PubMed search with the keywords 
"human", "sperm", "spermatozoa", "spermato- 
zoon", combined with the key word "proteome", 
"proteomics" or "mass spectrometry". We analyze 
the collected human sperm proteome by the 



J Reprod Infertil. 2011;12(3):193-199 



JRI 



Human Sperm Proteome 



Database for Annotation, Visualization and Inte- 
grated Discovery (DAVID) software. Using the 
DAVID software we particularly focused on the 
enriched biological themes, gene ontology (GO) 
terms, and discovered enriched functional-related 
gene groups (13). 

Proteome definition: The proteome has been de- 
fined as the protein complement of the genome. 
However, the definition of proteome has changed 
since it was first defined by Wilkins et al. in 1 995 
(14). Today, the term 'proteome' has developed to 
be: "The proteome of an individual is defined by 
the sum and the time dynamics of all protein 
species occurring during the life-time of this 
individual". This definition of proteome includes 
the protein expression of the individual protein, 
the isoforms of a protein and post-translational 
modifications of a protein (15). 

Techniques used in human sperm proteome map- 
ping: There are several initial reports using MS 
proteomics technology to identify a limited num- 
ber of proteins from the human sperm using two- 
dimensional gel electrophoresis (2 -DE) coupled to 
MALDI-TOF-MS analysis (16 - 23). An extensive 
human sperm proteome analysis using 1D-SDS- 
PAGE combined with electrospray liquid chroma- 
tography tandem mass spectrometry (GeLC-MS/ 
MS) approach identified 1,760 proteins (24). 
However, no protein list was published. The only 
far-reaching human sperm proteome analysis 
available to date is work done by Baker et al. (25). 
Using GeLC-MS/MS technique, they were able to 
map 1,056 unique proteins from the human 
spermatozoa. Literature review of the distribution 
of techniques used for mapping human spermato- 
zoa showed that two studies had used GeLC- 
MS/MS. Additionally; 2-DE had been used in 8 
studies to map human sperm proteome. To our 
best knowledge, no other techniques had been 
used for proteome profiling of human spermato- 
zoa, including multidimensional protein identifi- 
cation technique (MudPIT) or combined fractional 
diagonal chromatography (COFRADIC) tech- 
nique. A more extensive human sperm proteome 
could be obtained by combining different MS 
proteomics techniques. We have shown that dif- 
ferent MS proteomics techniques are able to 
identify a unique set of proteins (26). 

How many proteins are expressed in human 
sperm?: One of the big questions in the proteome 
analysis has been how big the human proteome 



size is? The near-complete sequencing of the 
human genome has yielded the total gene esti- 
mates that, at first glance, seem surprisingly low; 
of the order of 30000 open reading frames (27, 
28). However, when a gene is expressed it is sub- 
jected to alternative splicing mechanisms and 
post-translational modifications. It is estimated 
each gene could produce between 5 to 6 mRNAs 
by an alternative splicing mechanism and each of 
these mRNA species is in turn translated into 
proteins that are processed in various ways, 
generating on the order of 8-10 different modified 
forms of each polypeptide chain. Thus, the human 
genome may potentially produce on the order of 
(30000x6x10) 1.8 million different protein spe- 
cies (29). Defining each and every one of these 
proteins is what global collaborations, such as the 
Human Proteome Organization (HUPO) 1 is set to 
undertake. 

The question 'how many proteins, the most 
highly differentiated and unique cell type in the 
human body, the spermatozoa, contain?' is often 
posed in the literature (25, 30). Of course, it is 
quite difficult to predict the size of the human 
spermatozoa proteome from the existing prote- 
omics data, knowing the current limitation of MS 
proteomics technology (26, 31 - 33). However, 
Baker et al (30) used the current proteomics data 
available from yeast proteome to predict the num- 
ber of protein species of the human spermatozoa 
to be 2000-2500. As Baker et al. also point out, 
this is much lower than the identified proteome of 
bovine sperm (~ 4000) (34). However, Baker et 
al. argue that the high number of protein identified 
in the bovine sperm proteome is caused by false 
positive identification (30). 

Collected human sperm proteome analyzed by 
DAVID: The collected human sperm proteome 
were functionally categorized based on Gene 
Ontology (GO) annotation terms using the Data- 
base for Annotation, Visualization and Integrated 
Discovery (DAVID) program package 2 (13, 35 - 
37). For any gene or protein list, DAVID software 
tools are able to identify enriched biological 
themes, particularly GO terms, discover enriched 
functional-related gene groups, visualize genes or 
proteins on BioCarta and KEGG pathway maps, 
explore gene or protein names in batch, link gene- 



1- http://www.hupo.org/ 

2- http://david.abcc.ncifcrf.gov/ 
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disease associations, etc. Approximately 1,300 
proteins of the human sperm cell, sum of 2-DE 
and GeLC-MS/MS techniques, were analyzed by 
DAVID software. 

Biological function of human sperm proteome: 
Table 1 shows the ten most important catalogue 
outputs for biological function analysis by 
DAVID software DAVID software was only able 
to catalogue 793 of the submitted proteins. This 
means that biological functions of about 500 
proteins out of the collected human sperm prote- 
ome are still unknown. As it is shown in the Table 
1, the most important biologically functional 
proteins in the human sperm proteome belong to 
catabolic processes (16%), including proteins for 
the breakdown of carbon compounds with the 
liberation of energy used for sperm movement. 
DAVID categorized glucose catabolic processes 
and oxidative phosphorylation which is necessary 
for the homeostasis. In the table, we also find 
proteins belonging to spermatogenesis (3.6%) and 
spermiogenesis (0.9%). 

Cellular component of human sperm proteome: 
Table 2 shows the top ten outputs of cellular 
localization of the collected human sperm prote- 
ome from DAVID software. The software was 
able to map 850 of the identified proteins. Around 



450 of submitted proteins to DAVID were cat- 
egorized as unknown localization. 

Surprisingly, the most enriched groups from the 
collected human sperm proteome belong to 
cytoplasma (59%, 7.9E-48). It is well known that 
the human sperm lost most of its cytoplasm 
during spermiogensis process. A large number of 
proteins were categorized to be from mitochon- 
dria. Mitochondrial protein is not astonishing 
since the neck of human sperm is rich in mito- 
chondria. Additionally, protein enriched parts be- 
longing to the tail of human sperm were identified 
as cytoskeleton (12.6%, 1E-9) and flagellum 
(1.5%, 6.6E-8). 

As it is shown in Table 2 no transmembrane 
proteins were categorized from the collected 
human sperm proteome which are important types 
of proteins for the oocyte and sperm interaction. 
This probably is caused by MS proteomics tech- 
niques used for the proteome mapping of human 
sperm. It is a well-known fact that the hydropho- 
bic proteins, such as transmembrane proteins, 
rarely appear in gel-based techniques (26). Using 
gel-free techniques, such as MudPIT, will im- 
prove the deeper coverage of human sperm prote- 
ome. However, MudPIT is not a straightforward 
technique and it needs some expertise (38, 39). 



Table 1. Tabulated are the ten important biological functions with the greatest statistical significance 
for enrichment in the collected proteome data set of the human sperm (GOTERM: level ALL) 



Biological functions 


% 


P-value 


Catabolic processes 


16 


1.6E-24 


Proteasomal ubiquitin-dependent protein catabolic processes 


4.1 


4.4E-24 


Proteasomal protein catabolic processes 


4.1 


4.4E-24 


Generation of precursor metabolites and energy 


6.4 


4.6E-20 


Glucose catabolic processes 


2.6 


1E-16 


Cell cycle processes 


7.4 


5.5E-12 


Cell redox homeostasis 


1.4 


3.8E-5 


Spermatogenesis 


3.6 


3.8E-5 


Oxidative phosphorylation 


1.6 


3.5E-4 


Spermiogenesis 


0.9 


1E-2 



The percentage is calculated as: involved proteins divided by the total number of proteins multiplied by one- 
hundred. The enrichment P-value (compared to the theoretical human proteome) is calculated based on EASE 
Score, a modified Fisher's Exact Test and ranges from 0 to 1. Fisher's Exact P-value=0 represent perfect 
enrichment. Usually the P-value must be equal to or smaller than 0.05 to be considered strongly enriched in the 
annotation categories. The closer the value is to zero, the more enriched is the category 
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Table 2. Tabulated are the top ten important molecular functions with the greatest statistical signifi- 
cance for enrichment in the collected proteome data set of the human sperm (GOTERM: level ALL) 



Cellular localization 


% 


P-value 


Cytoplasm 


59 


7.9E-48 


Mitochondrion 


15.8 


1.1E-32 


Proteasome complex 


3.6 


4.5E-29 


Mitochondrial part 


9.7 


1.1E-23 


Intracellular 


70 


8.3E-22 


Mitochondrial matrix 


5.5 


3.6E-21 


Cytoskeleton 


12.6 


1.0E-9 


Flagellum 


1.5 


6.6E-8 


Eukaryotic translation elongation factor 1 complex 


0.5 


3.9E-5 


Cilia 


1.7 


1.4E-3 



Explanations for the percentage and p-values can be found in Table 1 



Functional categorization of the collected human 
sperm proteome: The most statistically significant 
functional annotation by DAVID software were 
the actylated proteins (36.7, 2.2E-89) and phos- 
phoprotein (47.4%, 1.8E-16) groups. This is to 
our knowledge that the most post-translated pro- 
teins identified so far were identified by using 
techniques such as 2-DE and GeLC-MS/MS (26, 
40). However, the exact function of these large 
numbers of post-translational modifications is un- 
known (personal communication with Baker M, 
author of the largest human sperm proteome pub- 
lished to date (25)). 

Metabolic pathway enriched in the collected human 
sperm proteome: One of the functions of DAVID 
software is to show the enriched KEGG pathways. 
The most significant metabolic pathway which 
were enriched in the collected human sperm pro- 
teome were proteasome (3%, 2E-22), fatty acid 
metabolism (1.6%, 3.3E-8), TCA cycle (1.4%, 
5.8E-8), Glycolysis/Gluconegenesis (1.9%,7.7E- 
8) and pyruvate metabolism (1.4%,1.9E-6). Ob- 
serving the enrichment of fatty acid and pyruvat 
metabolism is not surprising since sperm is under 
hypoxic condition. 

Sperm: a silent cell?: One of the discussions in 
sperm cell biology is whether any protein synthe- 
sis takes place in the sperm cells or not? (30, 41). 
Martinez-Heredia et al (21) identified transcrip- 
tion factor proteins in the proteome mapping of 
human sperm using 2-DE technique, in the pi 
range 5-8. Additionally, in the analysis of the 



collected data on human sperm proteome by 
DAVID software we are able to localize protein in 
the eukaryotic translation elongation factor 1 
complex (Table 2). However, a confirmation of 
these proteins by Western blotting technique is 
necessary in order to show that a protein synthesis 
actually takes place in sperm cells. 

Conclusion 

Although, the sperm protein content was one of 
the first cells to be analyzed, there is still a limited 
number of identified human sperm protein com- 
pared to other samples, such as brain proteome 
(7792 proteins) or the human neuroblastoma cell 
line SH-SY5Y proteome (3707 proteins) (42, 43). 
A deeper coverage of the human sperm proteome 
can be obtained using gel-free techniques such as 
MudPIT or COFRADIC (44, 45). It is well-estab- 
lished today that gel-free techniques have a better 
performance for the identification of basic, acidic 
and hydrophobic proteins than gel-based tech- 
niques (39, 46, 47). Chu et al (48) were able to 
identify very basic proteins using the MudPIT 
technology from C. elegans sperm proteome 
which is impossible to identify by gel-based tech- 
niques. Additionally, it should be kept in mind 
that a proteome is much more complex than a 
genome. The absence of a particular protein from 
any MS proteomics list does not necessarily mean 
that it is not present in the spermatozoa of that 
species. An alternative explanation is that the 
proteomic coverage could have been incomplete, 
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the protein had been in too low abundance or the 
protein in question might have been missed by 
chance. Although, the human sperm proteome is 
small and less complex than other cells, the 
functions of many of the identified proteins of 
human sperm are still unknown at the present. 
Immunolocalization can be readily used to obtain 
some clues to their function through determining 
their location within the sperm and the expression 
pattern of the corresponding proteins. Also knock- 
outs, knockdowns and conditional knockdowns 
should further contribute to the identification of 
their function. As the sperm proteome from dif- 
ferent species becomes available, the comparison 
of conserved proteins and domains would also 
provide important clues towards the essential 
conserved functions and evolution of sperm 
proteins. 
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